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1.  Background 


The  response  (y)  of  an  armor  target  to  a  ballistic  threat  can  be  characterized  as  penetration 
(y  =  1)  or  non-penetration  (y  =  0).  This  is  called  a  binary  quantal  response  (QR),  since  there 
are  exactly  two  possible  outcomes.  All  other  factors  being  constant,  one  may  consider  the  effect 
of  threat  velocity  (the  stimulus)  upon  penetration  (the  response).  The  basic  model  for  this 
interaction  is  that  penetration  is  random,  shots  are  independent,  and  that  the  probability  of 
penetration  is  some  function  G  ( v )  of  velocity, 

Pr[y  =  1  |  v]  =  G(v). 

So  y  has  a  Bernoulli  distribution  conditional  on  velocity  with  Bernoulli  parameter  G (re¬ 
course,  G  is  bound  between  0  and  1,  and  one  expects  that  G  is  an  increasing  function  of  v 
G  has  the  functional  form  of  a  cumulative  distribution  function  (cdf). 

Of  particular  interest  is  the  v50,  that  velocity  for  which  the  probability  of  penetration  equals  1/2. 

G(v5o)  —  0-5  (2) 

Analyses  of  the  v50  and  other  quantities  of  interest  are  conducted  by  collecting  samples  of 
penetration  responses  and  velocities,  estimating  the  function  G ,  and  performing  statistical 
inference  on  the  results.  This  general  paradigm  is  commonly  known  as  “sensitivity  analysis.” 


(1) 

Of 

.  Thus, 


2.  The  Location-Scale  Sensitivity  Model 


Quantal  response  (penetration)  y  G  {0,1}  to  a  continuous  stimulus  (velocity)  v  was  originally 
modeled  by  using  the  standard  normal  cdf  for  a  response  function 

1 


1  f 

(z)  —  ——  exp(— it2/2)  du 
y2n  J- oo 


(3) 


and  then  estimating  parameters  m  and  5  in  the  model 

/ v  —  m\ 

E[y  I  v]=  G0  ( — - — J  =  Pr[y  =  1  |  v],  (4) 

where  y  has  the  indicated  Bernoulli  distribution,  by  the  method  of  maximum  likelihood  (ML). 
This  is  the  location-scale  parameterization.  In  general,  cdfs  Ge  of  location-scale  distributions 
have  the  form 


1 


(5) 


Gd  O)  =  G0  (^)  , 

where  G0  is  a  standardized  cdf.  Then  the  parameter  vector  is  6  —  ^J,  where  the  location 
parameter  is  /r  and  the  scale  parameter  is  er. 

The  aim  of  analysis  is  to  estimate  the  parameter  6  and  then  make  inferences  and  perform 
statistical  tests  on  meaningful  population  parameters  such  as  v50 .  With  G0  chosen  such  that 
Go(0)  —  0.5,  then  Gg (pi)  —  0.5  and  y  =  v50.  Thus,  inferences  on  the  parameter  /i  are,  in  fact, 
inferences  on  v50. 


3.  The  Linear  Sensitivity  Model 


Response  (penetration)  y  G  {0,1}  has  expected  value  depending  on  stimulus  (velocity)  v, 

E[y  I  v\  =  Go(b0  +  b±v).  (6) 

The  conditional  distribution  of  y  is  Bernoulli  with  the  indicated  mean,  so 

E[y  |  v]  -  Go(b0  +  bxv)  =  Pr[y  =  1  I  v],  (7) 


If  G0  is  an  increasing  function,  then  the  response  function  G0  must  be  a  cdf. 

This  is  a  linear  parameterization, 

G/?0)  =  Go(b0  +  b1v),  (8) 


and  the  parameter  vector  is  /?  = 


b0 

[b,\ 


.  To  admit  additional  complexity,  these  models  recognize 


the  argument  of  G0  as  a  polynomial  and  thus  have 


E [y  |  v ]  —  Go(b0  +  btv  +  b2v2  +  — I-  bkvk) 
for  finite  k ,  where  k  —  1m  the  previous  example. 


(9) 
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4.  Generalized  Linear  Model  (GLM)  Estimation 


Quantal  response  function  estimation  can  be  implemented  using  the  Generalized  Linear  Model 
(GLM)  with  binomial  response  distribution  and  appropriate  link  functions.  (See,  for  example, 
McCulloch  and  Searle.1) 

The  response  y  has  expected  value  depending  on  stimulus  X  and  /?  (both  vectors) 

E[y  I  X]  =  G0(XP).  (10) 

The  distribution  of  y  is  Bernoulli  with  the  indicated  mean,  so 

E[y  I  X]  —  G0(X(3)  =  Pr[y  =  1  \  X].  (11) 

For  sensitivity  modeling  the  link  function  G0  is  usually  taken  to  be  a  continuous  cdf. 

Some  authors  call  the  inverse  function  Q0  —  Gy 1  the  link. 

GLM  estimates  the  linear  coefficients  /?  in  the  linear  model 

Q0  E[y  I  x]  =  xp.  (12) 

Common  choices  for  G0  include  the  normal  cdf  (probit  link) 

1  fz 

G0  (z)  =  -=  exp(— it2/2)  du  (13) 

V 2tc  J —co 

and  the  logistic  cdf  ( logit  link) 


G0  0)  = 


1 

1  +  e-z' 


(14) 


5.  Reparameterization 

Estimation  and  inference  on  the  location-scale  parameter  6  is  of  particular  interest,  since 
H  =  v50,  but  the  usual  GLM  estimate  provides  the  linear  parameter  /?  and  its  variance  estimate 
Vp.  Since 

'McCulloch,  C.;  Searle,  S.  Generalized,  Linear,  and  Mixed  Models',  John  Wiley  &  Sons:  New  York,  2001. 
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(15) 


V  —  fi  1 

b0  +  br  v  = - = - Y—v , 

o  a  o 


the  location- scale  parameter  is  obtained  by 


6  = 


bo/bt 

l  1/bJ 


(16) 


The  location-scale  variance  estimate  is  given  by  the  variance  transformation  of  equation  C-19  in 
appendix  C. 


dd 1 

V°=WV» 


ae 

dp' 


The  required  derivative  is 


~dg./db0 

do  j  db0 

-l/b1  0  ' 

1 

'  b±  O' 

_d^/db1 

do/db1_ 

b0/bl  -1/bl 

~~bf 

~b0  1. 

(17) 


(18) 


In  practice,  to  avoid  numerical  instability,  computations  are  conducted  using  the  standardized 
stimulus 


v~vm 
u  =  - 

Vs 


(19) 


where  vm  and  vs  are,  respectively,  the  sample  mean  and  standard  deviation  of  v,  and  the 

[a0i 

a  J  so  that 


,  (  aivm\,  ai  ^ , 

a0  +  aqu  =  a0 - -I - v  =  b0  +  btv  — 

V  vv  /  w 


V~[l  [1  1 

- = - Y—v. 

a  a  a 


(20) 


Thus,  the  location-scale  parameter  6  and  its  variance  estimate  Ve  can  be  recovered  from  the 
standardized  linear  parameter  a  and  its  variance  estimate  Va.  These  are 


g  —  W]  —  VsCLo/CLi 

~  la\-  [  vs/ ax 


(21) 


and 


4 


(22) 


V0  = 


d0t  dd 

da  da’ 


where 


dfj.  j  d  Uq 

do  j d  (Zq 

- -vs/a1  0 

_  _^s_ 

ax  O' 

d\xjdax 

do  j  da± 

vsa0/al  —vs/a\_ 

af 

~a0  1 

(23) 


Also,  if  need  be,  the  usual  linear  parameter  /?  and  its  variance  estimate  Vp  can  be  recovered  from 
the  standardized  linear  parameter  a  and  its  variance  estimate  Va.  These  are 


and 


\bo] 

a-0  -aiVm/vs 

M 

.  (Ii/Vs 

dp1  dp 
da  da’ 


where 


dp 

da 


db0  /  dci,Q 

db1/da0 

i 

0 

1 

o 

d  j  d  ct'i 

db1jda1_ 

-Vm/Vs 

Vvs. 

Vs 

-Vm  1- 

(24) 


(25) 


(26) 


6.  Confidence  Intervals 


First,  consider  velocity  confidence  intervals  on  vp  for  fixed  probability  of  penetration  p  (see 
figure  1).  The  response  function  gives  probability  of  penetration  at  velocity  v, 

Gg(v)  =  G0  (~~~~)  >  where  9  =  [a\  (27) 

The  nominal  point  estimate  of  vp,  the  velocity  at  which  the  probability  of  penetration  is  p,  is 
expressed  in  terms  of  the  quantile  function  Q  =  G~x  as 

vp  —  Qe(p)  =  P  +  oQ0  (p)  =  KB,  where  K  =  [1  Q0(p)].  (28) 

Confidence  intervals  are  calculated  from  the  estimator  distribution  9  ~  N2{Mq,  Vg).  So 

K6  —  N^KMq^VqK1).  (29) 
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Figure  1.  Estimate  and  normal  theory  vp  confidence  intervals. 


Since  K6  ~  KMg  +  JkVqK1  ■  Z  with  Z  ~  iV(0,l),  it  follows  that  a  100(1  —  a)%  two-sided 
confidence  interval  on  vp  is  given  in  the  usual  manner  by 

KMg  -  y/KVgKt  •  Zx_a/2  and  KMg  +  jKVgK*  •  Zx_a/^.  (30) 

The  quantile  point  Zq  of  the  standard  normal  distribution  satisfies  Pr [Z  <  Zq]  —  q. 

It  is  also  possible  to  calculate  confidence  intervals  on  the  probability  of  penetration  p  for  fixed 
velocity  v  (see  figure  2).  Start  with  the  linear  parameterization  of  the  response 

Gp(y)  —  Go(b0  +  bxv)  —  G0(KI3 )  ,  where  K  —  [1  v]  and  /?  = 

Confidence  intervals  are  calculated  from  the  estimator  distribution  /?  ~  N2(Mp,  Vp).  So 

Kp  =  N^KMp.KVpK1). 


6 


Since  K/3  ~  KMp  +  J KVpK 1  ■  Z  where,  again,  Z  ~  iV( 0,1),  it  follows  that  a  100(1  —  a)%  two 
sided  confidence  interval  on  p  is  given  by 


G0  (KMp  -  jKVpKt  •  Z!_a/2)  and  G0  ( ' KM p  +  IkV^K*  ■  Z,_a/2J 


(33) 


Figure  2.  Estimate  and  normal  theory  p  confidence  intervals. 


7.  Hypothesis  Testing 


The  theory  of  appendix  B  provides  a  means  for  hypothesis  tests  on  response  curve  parameters, 
and,  in  particular,  for  comparing  v50  estimates  against  each  other. 

The  quadratic  forms  derived  from  asymptotic  normal  distributions  of  maximum  likelihood 
estimators  are  known  as  Wald’s  Statistics,  and  the  tests  are  called  Wald’s  Tests. 
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Suppose  that  n  experiments  and  have  given  n  response  curve  parameter  estimates  ( (9, :  i  —  1, ,  n] 
with,  for  example,  the  probit  (cumulative  normal)  response  function  in  terms  of  the  usual 
location-scale  parameter  6  —  ^J,  where  0L  —  [  s.1]-  So  the  model  for  the  response  is  cumulative 
normal  with  mean  fi  and  standard  deviation  er. 

Since  the  mean  is  the  median,  ji  =  v50.  The  other  parameter  a  characterized  the  steepness  of  the 
response  curve.  Inferences  about  the  parameters  are  inferences  about  v50  and  the  steepness  of 
the  response.  So  there  are  n  sets  of  estimates  and  their  covariance  matrices.  For  i  —  1, ... ,  n. 


and 


vill 

Vil  2 


vil2 

Ut22-T 


In  terms  of  appendix  B, 


01 

o' 

X  = 

K 

and  V  = 

0 

Vn. 

(34) 


(35) 


and  the  test  is  constructed  by  choosing  K  (called  a  contrast  matrix  in  this  context)  to  compare 
certain  elements  of  X. 


H0\ KX  —  0  and  H±:  KX  *  0. 

The  test  statistic  is  the  associated  quadratic  form 

S  =  (KX) f  (KVK1)  ~ 1  KX. 

Under  the  null  hypothesis,  the  distribution  of  S  is  central  chi-square 

5  ~  Xr  , 


(36) 


(37) 


(38) 


and  the  alternative  distribution  with  true  parameter  value  6  —  M  is  noncentral  chi-square 

5  ~  Kr.s  >  (39) 

where  r  =  rank/6  and  the  noncentrality  parameter  is  d  =  (K M)1  (KV K1)-1  K M . 

Note:  Any  random  variable  Z  has  cumulative  distribution  function  (cdf)  Fz(t)  —  Pr  [Z  <  t]  and 
quantile  function  Qz(u)  —  inf  (x:  Fz(x)  >  it}. 

A  test  with  type  I  error  (probability  of  rejecting  a  true  H0 )  equal  to  a  has  critical  value  S0,  where 
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a  -  Pr[S  >  S0  |  H0]  , 


(40) 

since  large  values  of  S  are  significant,  and  S  exceeds  S0  with  H0 -probability  a.  Thus,  the  critical 
value  is  given  by 


So  —  Qx 2  (1  —  or). 

Based  on  the  observed  value  S  of  S,  the  p-value  of  an  experiment  is 

V  =  Pr[S  >  §  |  H0]  =  1  -  FX2  (5).  (42) 

The  decision  rule  is  to  reject  H0  if  S  >  S0,  or,  equivalently,  if  p  <  a. 

For  a  fixed  alternative  9  =  M  with  KM  =£  0,  the  type  II  error  /?  is  the  probability  of  not  rejecting 
the  null  hypothesis  H0  under  H1,  when  H0  is  false, 

P  =  Pr[5  <  S0  I  HJ  =  Fxh  (  S0)  =  Fx}s  (  Qxf(l  -  a)).  (43) 

The  power  q  of  the  test  is  the  probability  of  detecting  the  alternative, 

q  =  1  -  p  =  1  -  Fx2s  (Sq)  =  1  -  Fx2s  (  Qx 2  (1  -  a)).  (44) 

Illustrations  of  specific  tests  follow.  The  test  for  v50  equality  is 


H0:  [i±  =  M2  and  Ht:  Mi  ^  Mz- 


(45) 


The  contrast  is 


K  =  [  1  0  -1  0 


Then 


(46) 


KX  —  m1—m2  and  KVK 1  =  vltl  +  tz2n  , 


(47) 


and  the  quadratic  form  test  statistic  is 


(m1  —  m2 )2 
vm  +  ^211 


Xi- 


(48) 


Under  the  alternative  hypothesis  with  true  difference  in  mean  Mi  —  M2  =  A,  the  test  statistic  has 
the  noncentral  chi-square  distribution 
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s  ~  xls . 


(49) 


where 


8  = 


v 


in 


+  v 


211 


(50) 


and  the  test  has  power 

«  =  1  -  p  =  1  -  ( ■ so)  =  1  -  (  Crf  (1  -  «))• 

To  compare  response  curves  for  location  and  scale,  test 

H0:  Q/i,Oi)  =  (^2-^2)  and  Hi:  0*lfOi)  *  (te.oi) 


(51) 


(52) 


and  use  the  contrast 


The  test  statistic  S  is  /f ,  with 


1  0  -1  0 
0  1  0-1 


djjj  =  1%  —  m2  and  ds  — 


si  ~  S2> 


ds(Viii  v2ii)  2dsdm(v112  V212)  +  (v122  ^222) 

(,V111  ~  V21l)(.V122  ~  ^222)  —  (v112  +  ^212)^ 


(53) 


(54) 


To  compare  four  v50  estimates,  one  can  safely  discard  the  o  information  and  use  in 


'my 

Vi 

0 

0 

0- 

m2 

and  V  = 

0 

v2 

0 

0 

m3 

0 

0 

^3 

0 

m4 

.0 

0 

0 

V4- 

Tests  for  pairwise  comparisons  are  as  those  just  presented. 

H0:  Hi  =  Hj  and  H^.  fa  *  Hj  , 


(55) 


(56) 


where 
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(57) 


Vi  +  Vj  1- 


To  compare  all  four  v50  estimates, 

H0:  (Vi, ))  fii  =  \i  j  and  H ±:  (3i,;')Mi  *  Hj- 


(58) 


Use 


K  = 


1 

1 

LI 


-1 
0 

0  0  -1. 


0  0 

-1  0 


(59) 


or,  equivalently, 


K  = 


1  -1  0  0 

0  1-10 

0  0  1  -lJ 


(60) 


Both  give  the  same  S  ~  xi-  The  explicit  form  is  tedious,  but  reasonable  software  works  directly 
with  the  matrices  anyway. 


To  compare  the  first  against  the  mean  of  the  other  three, 

H0\y. i  =  (/U  +  Ms  +  A4)/3  and  H x:  [i±  V  (ji2  +  Ms  +  M4)/3, 


use 


K  =  [3  -1  -1  -1]. 


(61) 


(62) 


Then  5  ~ 


8.  Computation 


A  historical  implementation  of  quantal  response  computation  has  been  documented  by  McKaig 
and  Thomas.2  The  original  computations  involved  were  derived  from  first  principles.  (See,  for 
example,  DARCOM  P  706- 103. 3)  The  code  (written  in  FORTRAN)  must  be  compiled  for  use  on 
each  particular  platform.The  modem  approach  in  this  report  uses  GLM  explicitly.  (See  appendix  E 
for  details  on  Maximum  Likelihood  Estimation  (MLE)  for  the  GLM.)  This  technique  provides 
computationally  stable  solutions  for  polynomial  type  models  by  iterated  systems  of  linear 


2McKaig,  A.  E.;  Thomas,  J.  M.  Likelihood  Program  for  Sequential  Testing  Documentation;  ARBRL-TR-02481;  U.S.  Army 
Ballistics  Research  Laboratory:  Aberdeen  Proving  Ground,  MD,  1983. 

'DARCOM-P  706-103.  Engineering  Design  Handbook:  Selected  Topics  in  Experimental  Statistics  with  Army  Applications; 
DARCOM  pamphlet,  U.S.  Army  Materiel  Development  and  Readiness  Command.  1983. 
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equations.  GLM  also  provides  diagnostics  and  estimator  distributions  required  for  inference. 
Furthermore,  written  in  Java,  the  same  code  runs  on  Windows,  Linux,  and  Mac  OS  X. 

GLM  estimation  is  implemented  by  iterative  reweighted  least  squares  (IRLS)  maximization  of 
the  deviance  function  (which  is  linearly  related  to  the  log  likelihood).  For  the  natural  link,  in  this 
case  logit,  IRLS  is  equivalent  to  the  Newton-Raphson  method  of  solving  the  MLE  score  equation 

t'iP)  —  0  >  (63) 

where  L  —  —log  L.  The  estimator  sequence  {/^:  i  —  0,1,2  ... }  begins  with  an  intial  guess  /?0. 
Subsequent  elements  are  generated  by  solution  of  the  linear  differential  approximation 

£'(&)  + (ft + i-ftK"(ft)  =  0.  (64) 

For  the  other  links,  IRLS  is  equivalent  to  Fisher  scoring,  another  method  for  obtaining  the  MLE, 
in  which  the  Hessian  matrix  L"  is  replaced  by  its  expected  value 

L'api)  +  api+1-pi)E[L"api)\  =  o.  (65) 

In  all  cases,  some  stopping  rule  determines  termination  of  the  estimation  sequence. 
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This  stripped-down  Java  program  illustrates  the  Fisher  update  algorithm  computation  for  the 
logistic  model.  The  linear  parameter  estimate  is  b,  and  the  information  matrix  is  M  in  the  code. 


//////////////////////////////////////////////////////////////// 

//  QrDemo.java 

import  static  j ava . lang . Math . * ; 
public  class  QrDemo  { 

public  QrDemo (int  n,  double [ ]  x,  int[]  y)  { 

double  d,  dev,  devO=Double . POSITIVE_INFINITY,  v,  w,  A [ ] = { 0 ,  0}, 

M[ ][]={{ 0, 0} ,{ 0, 0}  } ,  mu [ ] =new  double [n],  eta[]=new  double [n],  b[]={0,0}; 

for  (int  iterations  =  1;  iterations  <=  64;  iterations!!)  { 

dev  =  0; 

for  (int  i  =  0;  i  <  n;  i+t)  { 

eta [i]  =  b [0]  +  b[l]  *  x[i]; 
mu[i]  =  1/ (1+exp (-eta [i] ) ) ; 

dev  +=  y[i]  ==11  log(mu[i])  :  log ( 1-mu [ i ]) ; 

} 

dev  *=  -2; 

System. out. printf ("%2d:  dev  =  %23.16e  ,  b[]  =  (%23.16e,  %23.16e)\n", 
iterations,  dev,  b[0],  b[l]); 

if  (abs ( (devO-dev) /dev)  <  le-16)  {  break;  } 

devO  =  dev; 

A [ 0 ]  =  A [ 1 ]  =  M [ 0 ] [ 0 ]  =  M [ 0 ] [ 1 ]  =  M [ 1 ] [ 0 ]  =M[1][1]  =0; 
for  (int  i  =  0;  i  <  n;  i+t)  { 
v  =  mu [ i ]  *  ( 1  -  mu [ i ] ) ; 
w  =  y [ i ]  -  mu [ i ] ; 

A  [  0 ]  +—  w ; 

A  [  1 ]  +=  w  *  x [ i]  ; 

M[0]  [0]  +=  v; 

M [ 0 ] [ 1 ]  +=  v  *  x [ i ] ; 

M [ 1] [ 1 ]  +=  v  *  x [ i ]  *  x [ i ] ; 

} 

d  =  M[0] [0]*M[1] [1]  -  M[0] [ 1 ] *M [ 0 ] [1]; 
b [0]  +=  (  A [ 0 ] *M [ 1 ] [ 1 ]  -  A[l] *M[0] [1]  )  /  d; 
b [ 1 ]  +=  (-  A[0] *M[0] [1]  +  A[l] *M[0] [0]  )  /  d; 

} 

System. out. printf ("b []  =  (%1.5f  %1.5f)\n",  b[0]  ,  b [ 1 ] ) ; 

} 

public  static  void  main (String  args [ ] )  { 

int  n=10; 

double  x[]  =  {  2620,  2667,  2717,  2718,  2721,  2724,  2744,  2811,  2840,  3020}; 

int  y  [  ]  =  {  0,  0,  0,  0,  1,  1,  0,  1,  1,  1}; 

new  QrDemo (n,  x,  y)  ; 

} 

} 

//////////////////////////////////////////////////////////////// 
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Compiling  and  running  the  program  exposes  the  deviance  convergence  and  estimator  sequence. 


$  javac  QrDemo.java 
$  java  QrDemo 


1 : 

dev 

= 

1 .3862 94361 1198906e+01 

b 

2  : 

dev 

= 

9. 4 19361 84569861 90e+00 

b 

3: 

dev 

= 

8.1337185880127830e+00 

b 

4  : 

dev 

= 

7. 66050663092 87860e+00 

b 

5: 

dev 

= 

7. 5662084 6907452 90e+00 

b 

6: 

dev 

= 

7.5591586884309470e+00 

b 

7  : 

dev 

= 

7. 55909754 68338840e+00 

b 

8: 

dev 

= 

7. 55909754 12 867530e+00 

b 

9: 

dev 

= 

7. 55909754 12 867560e+00 

b 

10: 

dev 

= 

7. 55909754 12867385e+00 

b 

11 : 

dev 

= 

7 .55909754 12 867440e+00 

b 

12  : 

dev 

= 

7. 55909754 12867530e+00 

b 

13: 

dev 

= 

7. 55909754 12867530e+00 

b 

b [ ]  =  (-112.13780  0.04100) 


(  0.0000000000000000e+00 
(-3.2157347386093360e+01 
(-5. 91722 17539474 130e+01 
(-8.6314382350221580e+01 
(-1 .04 66244 931233070e+02 
(-1.1142261623953550e+02 
(-1 . 1213096248281227 e+02 
(-1 . 1213779766541855e+02 
(-1 . 12 1377 982 92304 98e+02 
(-1 . 1213779829230472 e+02 
(-1 . 12 1377 982 923042 8e+02 
(-1 . 1213779829230556e+02 
(-1 . 1213779829230532 e+02 


0.0000000000000000e+00) 
1. 1 65881 6396959381e-02) 
2. 15 607 7 7 82 350303 6e-02) 
3. 1517928280382454e-02) 
3. 825378062 604 91 90e-02) 
4. 07361 92 959547 696e-02) 
4. 0996302051 673934e-02) 
4. 099881 1895006204e-02) 
4. 0998812 1251 93504e-02) 
4 .0998812 1251 934 10e-02) 
4. 0998812 1251 93250e-02) 
4 .0998812 1251 937 10e-02) 
4. 0998812 1251 93630e-02) 


9.  Applications 


The  code  can  be  extended  to  compute  parameter,  variance,  and  correlation  estimates  for  different 
parameterizations  and  display  the  results  along  with  a  Lower  Confidence  Bound  (LCB)  on  the 
v50.  This  example  implements  the  logit  link  with  Fisher  scoring  optimization,  using  data  with 
stimulus  (velocity)  in  the  first  column  and  response  (penetration)  in  the  second  column: 


600  . 

.000 

1 

579. 

.500 

0 

580  . 

.400 

0 

616. 

.400 

0 

626. 

.200 

1 

627  . 

.000 

1 

599. 

.  800 

0 

614  . 

.  900 

1 

613. 

.  100 

1 

575. 

.000 

0 

In  practice,  one  works  with  the  standardized  predictor  it  and  linear  parameter  a  [  ]  =  a  = 
(a0,  ax)  of  section  5  and  computes  its  variance  Va=  Va  and  correlation  Ra. 


a  [  ]  =  (-0.19415,  2.2279),  Va  =  (  0.88564,  -0.35817,  .  ,  1.7373), 

Ra  =  -0.28876 

Transformations  provide  the  usual  linear  and  location-scale  parameterizations  b  [  ]  =  /?  and 
ms[]  =  (mu,sg)  =  9  and  their  variances  Vb  =  Vn  and  Vms  =  Vg  and  correlations. 

b [  ]  =  (  -73.045,  0.12077),  Vb  =  (  1881.9,  -3.0988,  .  ,  0.0051048), 
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Rb  =  -0.99978 


ms [ ]  =  (  604.84,  8.2804),  Vms  =  (  57.348,  -6.3638,  .  ,  23.998), 

Rms  =  -0.17154 

Logit  Response  Parameters:  (  mu  ,  sg  )  =  (  604.84  ,  8.2804  ) 

The  location-scale  version  is  required  for  confidence  bounds  on  v50,  since  its  standard  deviation 
comes  from  Vq.  The  standard  normal  95%  quantile  Z_95  =1.645  gives  a  95%  LCB  on  v50. 

V50  estimate  =  mu  =  604.84 

V50  standard  deviation  =  SD.mu  =  7.5728 

95%  LCB  on  V50  =  mu  -  SD.mu*Z  95  =  592.38 

Computations  must  take  place  in  one  of  the  linear  versions,  and  the  standardized  version  avoids 
numerical  problems  caused  by  collinearity,  as  indicated  by  the  extreme  correlation  Rb  of  the  raw 
linear  version. 

LangMod  (Collins  and  Moss4)  is  a  Java  GUI  implementation  of  a  modified  Langlie  sequential 
strategy  for  quantal  response  testing.  LangMod  has  been  used  to  support  various  customer  and 
research  programs  for  testing  personal  protective  equipment  as  well  as  ground  and  aircraft 
targets.  LangMod  incorporates  (among  other  things)  the  logistic  regression  calculations 
developed  in  this  report  and  displays  QR  calculation  results  and  a  graph  of  the  data  and  response 
function  estimate  as  shown  in  figure  3. 


^Collins,  J.;  Moss,  L.  LangMod  Users  Manual ;  ARL-TN-437;  U.S.  Army  Research  Laboratory:  Aberdeen  Proving  Ground, 
MD,  2011. 


15 


quantal  response  (logit) 


v 


o  PP  ♦  CP  —  zmr  —  F[v]  •  mu  — -  3&3  -f-  v50 


Figure  3.  Quantal  response  from  LangMod. 

Similar  capabilities  are  implemented  in  a  local  S-PLUS  library,  libv50 ,  which  has  been  used 
in  support  of  various  projects  (see,  for  example,  Collins  et  al.5).  This  library  uses  the  native 
S-PLUS  GLM  computations.  For  example,  in  S-PLUS,  the  GLM  computation  can  be 
implemented  as 

fit  <-  glm(x~v,  family=binomial  (1 i nk=l ogi t) ,  data=z) 

where  the  data  frame  z  contains  stimulus  velocity  v  and  response  penetration  x  E  {0,1}.  Then, 
glm  returns  model  coefficient  estimates  /?  =  ( b0 ,  by)  in  f i  t$coef f i  ci  ents  and  estimated 
parameter  variance  matrix  Vp  in  summary  (fi  t)  $cov .  unseal  ed.  The  library  implements  the 
reparameterization,  hypothesis  testing,  and  confidence  interval  procedures  outlined  in  this  report. 
The  computations  and  graphics  for  figures  1  and  2  were  generated  by  libv50. 


5Collins,  J.,  et  al.  Sensitivity  of  Mounting  Methods  or  the  Outer  Tactical  Vest  and  Shoot  Pack  Ballistic  Limit,  Phase  I: 
Current  Mounting  Methods:  ARL-TR-4116;  U.S.  Army  Research  Laboratory:  Aberdeen  Proving  Ground,  MD,  2007. 
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Appendix  A.  Vector  Function  Conventions 


A.l  Notation 

Vectors  are  columns.  With  x  G  Mn, 


(A-l) 


and  if  /:  Mn  -*  Mk,  then  /(x)  G  Mk 

and 

fix)  = 

r/i(*)i 

/2<X> 

_ 

-AW. 

f2(x1,...,xn) 

fk  (-Tl>  •••  >  %n)- 


(A-2) 


If  A  is  an  n  x  k  matrix,  the  aL  ,  denotes  the  element  in  row  i  and  column  j.  The  transpose  of  A , 


B  —  A1 , 


is  a  k  x  n  matrix  and  has  bLj  —  a^. 


A.2  Inner  Products  and  Norms 

The  usual  inner  product  is 


IL 

(x,y)  =  xly  =  ^Xjyj, 


and  a  weighted  inner  product  is 

Mw  =  x<Wy  =  Y£XiWiiy, 


1=1 ]  = 1 


The  associated  (squared)  norms  are 

n 

||x||2  =  (x,x)  =  xtx  =  ^  xf 


(A-3) 


(A-4) 


(A-5) 


(A-6) 
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and 


n  n 

\\x\\ w  =  xtWx  =  ^  ^  XiWijXj. 

i  =  1  j  =  1 


In  the  event  that  VK  is  a  diagonal  matrix, 


(x,y)w 


I  L 

—  xtWy  —  ^ 


XiWuyi 


1=1 


and 


x 


w 


=  (x,  x) 


w 


-I 

i  =  1 


(A-7) 


(A-8) 


(A-9) 


A.3  Derivatives 


The  derivative  of  /  is  the  n  x  k  matrix  function  in  which  column  j  is  the  derivative  dfj/dx  of 
the  coordinate  j  scalar  field  fj.  So  the  element  in  row  i  and  column  j  is  dfj/dx and  the 
derivative  is 


[df. 

df2 

dfkl 

dx. 

dx. 

dx. 

df. 

df2 

dfk 

dx2 

dx2 

dx2 

df. 

df2 

dfk 

_dxn 

dxn 

dxn. 

(A- 10) 


In  particular,  the  derivative  of  a  scalar  field  /:  Mn  M1  is  a  column  vector 


df 

dx 


df~ 
dxx 
df 
dx2  ■ 

df 
. dxn . 


(A- 11) 


The  linearization  of  /  at  x0  is 
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f(x0  +  8)  =  f(x0)  +  (  faf(Xo)  ]  •  5. 


(A- 12) 


If/: 


is  given  by  a  linear  transformation  with  k  x  n  matrix  A, 


f(x)  =  Ax  — 


an  a12 
a21  a22 


\P-kl  ak2 
Then  the  derivative  is  the  n  x  k  matrix 


Oi  n 
O2  n 

Okn 


X1 

'Z"=i  axjxf 

x2 

= 

lf=1a2jXj 

xn_ 

Xj  =  l  akjXj. 

(A- 13) 


-an 

0-21  ■ 

"  akl 

a12 

a22 

"  ak2 

O-in 

0-2  n 

O-kn. 

d/ 

dx 


and  /(x0  +  d)  =  /(x0)  +  j45,  as  expected. 


f(x)  =  xtA  =  [Xi  x2  •••  xn] 


A1, 


-an 

0-12  ' 

"  alk- 

~^i  =  lxiOn 

<*21 

a22 

"  a2  k 

— 

Zi  =  lxiOi2 

_a/cl 

a/c2  ' 

O-nk. 

Zi  =  i  XiOik. 

(A- 14) 


(A- 15) 


Then  the  derivative  is  the  n  x  k  matrix 


df 

dx 


-an 

Ol2  ' 

'  olk- 

o2i 

a22 

'  a2k 

=  A. 

(A- 16) 

P-ni 

an2 

O-nk. 

So  derivatives  of  inner  products  and  (squared)  norms  are 


f{x,y)=f(y,x)=y, 


(A- 17) 


d  d  . 

-f-(x,y)w  =  Wy  and  —{y,x)w  =  Wzy , 


(A- 18) 


SM2  =  2*. 


(A- 19) 
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(A-20) 


(W  +  Wl)x, 


and 


—  ||x  -  a\\w2  -  (W  +  W^ix  -  a). 


Some  second  derivatives  are 


d2 

dxx t 


IMI2  —  2/ 


(A-21) 


(A-22) 


and 


d2 

dxx 1 


=  W  +  Wf 


(A-23) 


A.4  The  Chain  Rule 

If  x  itself  is  a  function  with  x:  Mm  -*  Mn.  write  x  =  x(u)  G  Mn  for  u  G  Mm. 

Then  q  =  f  °  x:  EAm  -*  Mfe  and  the  elements  of  the  m  x  k  matrix  —  are  —  =  YIL,  for 

du  dUi  r  dXp  dU( 

i  —  1, ... ,  m  and  j  —  1, ... ,  k.  So,  the  derivative  of  g  is 


dg_ 

du 


r  n 


I 


dxp 
dxp  du± 


i 


df±  dxp 

dXp  du 2 


z 


df±  dxp 

dXp  dUyyi 


dxx  dx  2 
dux  du -y 
dx dx  2 
du2  du2 


dx- ^  dx  2 
dum  dum 


Zdf2  dxp 
dxv  du± 

n  —  1  ** 


Zdf2  dxp 

dxv  du2 
—  1  ' 


Zdf2  dxp 
dxv  dum 

-rt  —  1  ' 


I 

P  =  1 
n 

I 


I 


d/fc 

dxp 

dxp 

du± 

d/fc 

dxp 

dxp 

du2 

dfk 

dxp 

dxp 

dum 

s 

"CS 

r  df±  df2  dfk  j 

du± 

dx-^  dx dx-^ 

dxn 

df i  d/2  d/fc 

du2 

dx2  dx  2  dx2 

dxn 

dfi  df2  dfk 

d-V-im- 

-dXyi  dXyi  dx^i~ 

dx  df 
du  dx 


(A-24) 
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Thus  follows  the  chain  rule  for  vector  fields 


d 

du 


,  >  d  d 

f[x(.u))  =  ^-x(u)  ‘  -^f(.x)\x=x(uy 


dxJ 


(A-25) 


_ ^ 

In  particular,  with  x  =  f_1  and  f(x(u))  =  u,  then  =  1  and  -pf-1(u)  =  f^-f(x)^ 


x=r1(a) 


Simply  write  —  = - when  f  is  a  function  of  x,  which  is,  in  turn,  a  function  of  u  and  write 

du  du  dx 

dx  /du\-1  .  , 

s;  =  U,)  •in8eneral- 


For  example, 


A  \\Bx\\L  =  (Bx)  •  ( W  +  Wt)Bx  =  Bt{W  +  W^Bx  , 
dx  dx 


(A-26) 


and 


-7-  ||Bx  +  a\\L  =  -^-(Bx)  •  ( W  +  W^Bx  +  a)  =  Bl(W  +  W^Bx  +  a), 
ax  ax 


(A-27) 
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Intentionally  lelt  blank. 
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Appendix  B.  Multivariate  Normal  Distribution  and  Quadratic  Forms 


B.l  Normal  Distribution 

The  vector 

xi 

X  =  i 

xn. 

has  the  multivariate  normal  distribution 

X~Nn(M,V), 


with  mean  vector 


(B-l) 


(B-2) 


(B-3) 


and  covariance  matrix 


Vin 

Vnn 


(B-4) 


when  each  Xj  is  normally  distributed  with  mean  and  variance  v a  so  that  Xj  ~  IV  (mj,  %)  and 
the  Xj  are  related  by  Co v(xi,x/)  =  v^.  So,  EX  =  M  and  Var X  =  E[(X  —  M)(X  —  M)f]  =  V . 

The  probability  density  function  (pdf)  of  X  is  /(x),  where 


(2n)n/2\V\1/2f(x)  =  exp  -^{x-MfY  1(x  -  M)  =  exp  ||x  -  .  (B-5) 


In  particular,  the  Xj  are  independent  if  V  is  a  diagonal  matrix.  If  V  —  vl  where  /  is  the  identity 
matrix,  then  all  Xj  have  the  same  variance  v.  If  V  —  I,  then  all  Xj  have  unit  variance.  If, 
additionally,  M  =  0,  then  the  Xj  are  independent  standard  normal  N  (0,1). 

For  any  n-  vector  A , 


(B-6) 


X  has  the  translation  property 
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X  +  A  ~  Nn(M  +  A,  V)  . 


(B-7) 


For  any  linear  transformation  K\  En  -*  Rr, 

r k-n  •••  kln 


the  distribution  of  KX  is 


KX  ~  Nr(KM,  KVK1). 


(B-8) 


(B-9) 


Useful  conditions  are  that  V  is  nonsingular  (positive-definite)  and  that  r  <  n  and  rank  K  —  r,  so 
that  KVKl  is  also  nonsingular  (positive-definite). 

B.2  The  Quadratic  Form 

Associated  with  each  nondegenerate  multivariate  normal  A  is  a  standard  quadratic  form  which 
has  a  central  x2  distribution.  Any  symmetric  positive-definite  square  matrix  V  has  a  “square 
root”  U  which  can  be  obtained  through  Choleski,  singular  value,  or  spectral  decomposition 
where  V  —  UUt .  Then  the  inverse  W  —  V~x  can  be  written  as  W  =  TlT  where  T  =  U~1.  Now 
if  X  ~  Nn(M,  V ),  then  A  -  M  ~  Nn( 0,  V)  and  Z  =  T(X  -  M)  ~  Nn( 0,  TVT1).  Since 
TVT1  =  TUUtTt  —  I,  it  follows  that  Z  ~  Nn( 0,  /),  and  the  elements  of  Z  are  iid  N( 0,1),  so 


n 


i= 1 


=  ZfZ 


Xn> 


(B- 10) 


which  is  chi-squared  with  n  degrees  of  freedom.  In  terms  of  matrices, 

ZfZ  =  (r( A  -  M))t7’(A  -  M)  =  (A  -  M)t7’t7’  (A  -  M) 

=  (A  -  MYW  (A  -  M)  =  (A  -  M)F_1(A  -  M)  (B-ll) 

is  the  quadratic  form  associated  with  A  ~  Nn(M ,  V ).  It  has  the  chi-squared  distribution 

ZfZ  =  (A  -  MYV-\ X  —  M)  ~  Xn  (B- 12) 


with  n  degrees  of  freedom. 

There  is  also  a  noncentral  quadratic  form  associated  with  A.  Suppose  now  that 
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Y  —  TX  ~  Nn(TM,  /) 


(B-13) 


and  let 


TM  = 


rdii 


Ld 


n  J 


The  yj  are  independent  normal  /V  (d,-,  1),  and 

n 

2  ^  ~  *n.a  ■ 

i=l 

This  is  a  noncentral  chi-squared  distribution  with  noncentrality  parameter 


(B-14) 


(B- 15) 


n  n 


5  =  =ZE[yi]2-  (b-16) 

i=l  i=l 

In  the  literature,  the  noncentrality  parameter  is  variously  considered  to  be  8  or  8/2  or  VS. 
Anyway, 


n 

^yt?  =  ytY  =  ( TXfTX  =  XtTtTX  =  XtWX  =  XtV~1X. 

i=l 


(B-17) 


Furthermore, 

n 

8  =  ^  df  =  ( TMfTM  =  MtTtTM  = 

i= 1 

So  the  noncentral  quadratic  form  associated  with  X  ~  Nn(M,  V)  is 

Y‘Y  =  xty~lx  ~  iv-M- 


(B-18) 


(B- 19) 


which  has  the  noncentral  chi-square  distribution  with  n  degrees  of  freedom  and  noncentrality 
parameter  8  =  MtV~1M. 

It  is  possible  to  combine  transformations  and  quadratic  forms.  Start  with 

X  ~  Nn(M,V).  (B-20) 

The  transformed  centered  vector  is 
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K{X-M )  ~  Nr(0,KVKt')  , 


(B-21) 


and  its  quadratic  form  is 

[K(X  -  M)\t(KVKty1[K(X  -  M)]  ~  Xr- 
The  transformed  (uncentered)  vector  is 

KX  ~  Nr(KM,  KVK^  , 


(B-22) 


(B-23) 


and  its  quadratic  form  is 


(KXytKVK^KX  ~  x 


r,(KMY{KVK c)  KM  ’ 


(B-24) 


which  is  noncentral  chi-square  with  r  degreed  of  freedom  and  noncentrality  parameter 


8  =  (KM)t(KVKt)~1KM. 


(B-25) 
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Appendix  C.  Maximum  Likelihood  Estimator  Distributions 


In  general,  the  joint  density  (likelihood)  L  of  a  sample  {Xi}1jL=l  depends  on  a  k-dimensional 
parameter  xp  =  (xpll ... ,  xpk ),  as 

L(X1,...,Xn-,xp).  (C-l) 

The  log  likelihood  is  denoted  L  =  log  L.  A  maximum  likelihood  estimator  xp  of  xp  satisfies 


xp  —  argma  xL(xp)  —  argma  xL(xp). 
The  kxk  Fisher  Information  Matrix  for  the  entire  sample 


(C-2) 


(dL\  (dLx1 
M^  =  E  \dPp)  '  \Php) 


(C-3) 


has  inverse 


Vrp  =  MPp1 . 


(C-4) 


The  asymptotic  distribution  of  the  MLE  is 


yfn(xp  — 1/>)  -»  N^O.nV^  as  n  ->  oo 


(C-5) 


If  the  samples  are  independent,  then  the  likelihood  is 


u 

=n^)- 


(C-6, 


Its  logarithm  is 


u 

■C  —  log L  =  ^log  fiiXP)  , 


(C-l) 


and  the  derivative  is 
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dL  v  d 

!4,  =  Ld4,'osf^- 

i= 1 


The  outer  product  is 


dL\  (dL 


#)  ■  U)  =  Z  W,o« /iW))  ■  (dmioB  m)) 

i-1 


wt)  ■  {&109 

i*j 


and  since 


'  d  1  /"Id/  rdf  dr  d 

E[d^l0g^J  =  J  Jdxp  ^  =  J  dip  ~  dip  J  ?  =  Up1  =  °' 


the  information  for  the  entire  sample  is 


IL 

!>■ 


where  Mt  is  the  information  matrix  for  a  single  observation 


Furthermore,  if  the  samples  are  identically  distributed,  then  Mw,  =  nM0,  where 


(C-8) 


(C-9) 


As  the  samples  are  independent,  the  information  matrix  ML  is 

■[©  ©1-f1 -OH 

i= 1 

+ 1 E  [(^log  )]  - E  [(v°g  ffo))]  ■  (c-io) 


(C-ll) 


(C-12) 


(C-13) 
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M0  =  E 


{^°sn><o))  •  (^iog /w) 


d 

dip 


(C-14) 


In  this  case,  with  =  M0  1,  the  variance  is  =  M^1  =  n  1M01  —  n  1V0,  and  for  an  iid 
sample,  the  result  is  the  usual 

Vn(xp  —  xp)  -*  Nk(0,  V0 )  as  n  -»  oo 

xp  ~  Nk(\p,n~1V0).  (C-15) 

Now  consider  another  parameterization  <p.  The  chain  rule  gives 


dL  dcp  dL 
dxp  dxp  dcp  ’ 


(C-16) 


so  the  information  matrix  transformation  is 


Mx/,  —  E 


/ dcp  dL\  / dcp  dL\l 
\dxp  dcp  J  \dxp  dcp) 


dcp  /d£\ 
dxp  \dcp)  \dcp) 
dcp  dcp 1 
dxp  ^  dxp 


dcp 

dxp 


(C-17) 


The  chain  rule  also  gives  the  inverse  of  a  derivative  matrix  as  the  derivative  of  the  inverse 
function,  so 


Mw,"1  =  1-^-1  | 


' dcp 


-l 


-i  (dcp\  1 


,dxp 


dcp\ 

\dxp) 


(C-18) 


and  the  corresponding  variance  transformation  is 


dxp  dxp 
V^~d$  V(p  dcp  ' 


(C-19) 
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Intentionally  lelt  blank. 
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Appendix  D.  The  Linear  Model 


D.l  The  Basic  Model 


The  usual  linear  model  is 


Y  —  X/3  +  e  , 


(D-l) 


where  Y  is  an  n  x  1  response,  X  is  an  n  x  p  independent  matrix,  /?  is  a  p  x  1  parameter,  and  the 
n  x  1  error  e  ~  1V(0,  er2/n).  So  E  T  =  X/3  and  Var  Y  —  er2/n.  Each  column  of  X  is  a  linear 
predictor,  and  the  model  is 


t' 


(D-2) 


One  works  with  the  p  -parameter  model  for  a  single  predictor  v  by  choosing  a  set  of  fixed  basis 
functions  {/1; ...  ,fp )  and  setting  XLj  —  fj{vL). 


r 


(D-3) 


For  example,  choice  of  /.-  (y)  =  y;  1  gives  the  polynomial  model 


y  =  Po  +  /?iv  +  (32v2  +  •••  +  /?p_iVp  x. 


(D-4) 


(See  equation  B-5  in  appendix  B.)  Solution  by  least  squares  is  equivalent  to  maximum 
likelihood  for  normal  error,  and  the  criterion  is  to  choose  u  that  minimizes  Q  —  ete  =  ||e||2 
=  || Y  —  X(3\\2  since  e  —  Y  —  X(3.  This  is 


2  -  2 p2XtY+  \\Xp\\2. 


(D-5) 


The  solution  follows  from  setting  the  derivative  to  0, 


=  -2yfy  +  2xtxp  =  o , 

dp 


(D-6) 


to  obtain  the  normal  equations 
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xtxp  =  XlY 


(D-7) 


with  solution 


^(Xtxr'Xty  (D-8) 

and  response  estimate 

T  =  HY,  (D-9) 

where  the  so-called  hat  matrix  is 

H  —  X  (X^)-1^.  (D-10) 

Note  that  E  /?  =  p  and  Var  /?  =  cr2(ArtAr)_1. 

Modern  software  for  linear  least-squares  estimation  operates  on  equation  D-7  through  the 
response  vector  Y  and  the  design  matrix  X.  The  normal  equations  are  solved  efficiently  without 
inverting  the  design  matrix,  and  software  provides  parameter  estimates  and  diagnostics  such  as 
the  parameter  variance  and  hat  matrix  diagonal. 

D.2  The  Weighted  Model 

When  the  error  is  IV (0, 1),  the  correct  inner  product  is  weighted  by  the  symmetric  W  =  I-1,  so 

Q  =  e' 'We  =  \\e\\2w  =  \\Y  -  xp\\w ■  This  is 

Q  =  \\Y\\w  ~  2 PWWY  +  \\Xp\\w ■  (D- 11) 


Then 


dQ 

d/3 


=  -2  XtWY  +  2XtWXp  =  0. 


(D-12) 


The  normal  equation  is 


XtWXp  =  XtWY. 


(D-13) 


The  solution  is 


p  =  (x^xywwY, 

and  the  response  estimate  is  Y  =  HY,  where 


(D-14) 


32 


H  =  X(  XtWX)~1XtW. 


(D-15) 


Note  that  E  ft  =  /?  and  Var  /?  =  (X^Xy1. 

Modern  software  for  linear  least-squares  estimation  operates  on  equation  D-13  through  the 
response  vector  Y,  the  weight  vector  W,  and  the  design  matrix  X.  The  normal  equations  are 
solved  efficiently  without  inverting  the  design  matrix,  and  software  provides  parameter  estimates 
and  diagnostics  such  as  the  parameter  variance  and  hat  matrix  diagonal. 

Note  that  for  these  models,  the  likelihood  function  is 


L(/?)  =  (2tt)  n/2|W|1/2exp 


\  \\Y-XP\\2W  . 


(D-16) 


Its  log  derivative  is 


—  log  L(P)  =  XtW(Y-Xp), 


and  the  information  matrix  is 

Mp  =  XtWX. 


(D-17) 


(D-18) 
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Appendix  E.  The  Generalized  Linear  Model 


E.l  Model  Formulation 

In  the  Generalized  Linear  Model  (GLM),  the  response  Y  has  an  arbitrary  distribution,  g  =  Xf3  is 
a  linear  function  of  the  k-dimensional  parameter  /?,  and  the  mean  response  is  modeled  as 


H  =  E[Y\X]  =  G(77) 


(E-l) 


for  some  monotone  link  function  G  with  derivative  g  =  G' .  (Some  authors  call  G  1  the  link.) 
Response  distributions  are  taken  to  be  from  a  single-parameter  exponential  family,  with  the  form 


yd  — b(9 ) 

fiy,S,xp)  =  exp  ^  —  +  c(y,ip )  . 

The  parameter#  is  to  be  estimated,  and  xp  is  a  nuisance  parameter. 


(E-2) 


With  £  —  log/,  calculate  the  moments  of  Y  in  terms  of  exponential  family  components. 


E\^£(y,e,xp)\  =  0 


(E-3) 


because  /  f(y,  9,  ip)dy  =  1,  and  under  suitable  regularity  conditions,  0  =  —  /  f(y,  9,  ip)dy 

dO 

=  i •  f(y,9,xjj)dy.  Therefore,  E[(y  -  b'(9))/a(xp)]  =  0,  and 


E[E]  =  g  =  G(jf)  =  b’(9). 


(E-4) 


Also,  as  usual, 


E[(A€(y.0,«)  ]  =  -E[T_€(y.0.« 

because  Var  [±t]  =  E  =  f  ■  f  dy  =  /  ■  ±f  dy  =  ■  // 


(E-5) 


Therefore,  E[(y  —  g)2/a(xp)2]  —  —  E[—b"(9)/a(xp)],  and 
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Var  Y  —  v(g)a(ip)  —  b"  (d)a(xp), 
where  v(g)  =  Var [Y]/a(xp)  —  b”(9 )  =  g(ji ) 


(E-6) 


In  the  case  that  77  =  6,  and  hence  g  =  G(rf)  =  G(0),  G  is  called  the  canonical  link  function. 
Then  g  =  b'(9)  =  G(9)  =  G(jj )  and  v(g )  =  b"(9 )  =  $(0)  -  g(ji ). 

E.2  Estimation 

Let  [M  •"  *n]  =  X1 ,  so  the  (column)  vector  x*  is  row  i  of  X,  gt  =  xf/?,  and  9t  —  9(rji'). 

Maximum  likelihood  estimation  for  the  GLM  is  accomplished  by  maximizing  the  log  likelihood 
function 


_  v  ViOi-Wt) 


1=1 


1=1 


a  0/0 


+  c(yi,xp) 


(E-7) 


This  is  a  weighted  least-squared  problem  where  the  design  and  weight  depend  on  the  unknown 
parameter,  and  it  can  be  solved  iteratively  by  the  Newton-Raphson  method. 

The  Newton-Raphson  Method 

In  one  dimension,  a  zero  of  F  is  obtained  by  linearizing  and  updating  the  current  argument  x0  to 
x,  solving  F(x0)  +  (x  —  x0)F'(x0 )  =  0  to  get  x  =  x0  —  F(x0)/F'(x0).  Optimize  F  by  setting 
F"  —  0,  so  the  update  is  x  =  x0  —  F'(x0)/F"(x0). 


d2  1  d 

The  vector  version  is  x  =  x0  —  -^~t  F(x0)  —  F(x0).  Taking  x  =  x0  +  5,  the  increment 


8  =  x  —  x0  satisfies 


E(x0) 


—  —  F(x0).  Some  derivatives  are  required. 


Gradient 


Differentiating  gives  the  gradient  (vector  of  first  derivatives) 


d£  ,  v-1  d9j 


1=1 


dp 


(E-8) 


Since  ^b\9i')  =  b"{9i')'^  =  v{gi')^  and  ^h'(^)  =^  =  —G(i g)  =  —  G(xfF) 

dp  y  u  y  u  dp  dp  dp  y  u  dp  dp  y  lu  dp  v 

=  g(j]i)Xi,  then 
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ddj  _  gpjiJ 

dp  vM X> 


So  the  gradient  is 


D(/?)  =  a(xp)  1  •  ^(y£  -  gd  •  ~ryxi  =  a0/0  1  ‘  X^fUf, 


1=1 


v(jiO 


where,  for  i  —  1, ... ,  n.  the  diagonal  weight  matrix  WFhas  elements  wFi  —  g(jli) 


WF  =  diag 


£0?  i)2 
-  v(Ml) 


a(.Vn)2' 

v(jln)  ’ 


and  the  centered/scaled  response  vector  UF  has  elements  uFi  =  (yt  —  gd/g(gt). 


UF  — 


yi  -  Mi 
.  g(jh) 


yn  dn 

giVn) 


Hessian 

Using  =  v'(jii)g(xfp)xt. 


d2  n  g'(vi>(gd  -  g(xid2v\g.d  t 

w8i  = - 777 - 


and  the  Hessian  (matrix  of  second  derivatives)  is 


W)  =  =  *00 


=  a  0/0" 


-‘■a 


t=l 

2 


ZgOny 

vQid 

i=l 


ddj  ddf  d2 

-b"(e‘)WW+(y^flddj^e‘ 

g'Oldvfai)  -  g(gd2v'(gi') 


+  (yi  -  gd 


v(gty 


Xi 


=  aixpy1  ■  XlWNX, 


where 


wN  =  wF  -  wD 

and  WD  is  a  diagonal  matrix  with  diagonal  elements 


(E-9) 

(E-10) 

2 /vM, 

(E-ll) 

(E-12) 

(E-13) 


(E-14) 

(E-15) 
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(E-16) 


wDi  =  (yt  -  gd 


g'(jii)v(jii)  -g(Vi)  V(nd 


KMl)2 

Since  Efl/V^]  =  0,  the  expected  value  of  the  Hessian  is 

E  H(J3)  =  -aiipy1  ■XtWFX. 
The  Fisher  Information  Matrix  is 


Mp  —  E 


d  d  . 

—  L - £} 

=  -E 

d2 

l 

.dp  dp 

dppt 

=  -  E  Jf(/?)  =  -aixpy1  ■  XtWFX, 


and  the  asymptotic  estimator  distribution  is  Nk  (/?,  a(xp)  ■  (XtWFX  )_1). 

Newton-Raphson 

Now,  apply  the  Newton-Raphson  algorithm  iteratively  to  solve  the  optimization. 
For  GLM,  the  Newton-Raphson  update  is  /?  =  /?0  +  8,  where 

=  -»(/?o) 

(XtWNX)S  =  XtWFUF 
(XtWNX)S  =  XtWNWu1WFUF 
(X^X^S  =  XtWNUN 


with 


UN  =  W^WpUp. 


(E-17) 


(E-18) 


(E-19) 


(E-20) 


These  are  the  normal  equations  for  minimization  of  Q  =  ||  UN  —  XS\\wN-  Both  WN  and  UN 
depend  on  /?0.  The  normal  equations  can  be  solved  iteratively  with  an  initial  guess  /?0  by 
calculating  g  —  Xpo,  g  =  G(r /),  g ,  g',  v,  v' ,  WF,  UF ,  WD,  WN,  and  UN.  Then  solve  for  8.  The 
updated  solution  is  /?  =  /?0  +  8.  Now  replace  /?0  with  /?,  and  repeat.  This  is  iteratively 
reweighted  least  squares  with  Newton-Raphson  update. 

Fisher  Scoring 

For  the  GLM,  the  Fisher  Scoring  update  uses  E  K  in  place  of  K  to  get 
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E H(Po)5  =  -D(/?0)  and  (X^pX^S  =  X^pUp, 


(E-21) 


which  are  the  normal  equations  for  Q  =  \\UF  —  X8 \\wF-  Both  WF  and  Up  depend  on  /?0.  The 
normal  equations  can  be  solved  iteratively  with  an  initial  guess  /?0  by  calculating  77  =  A/?0, 
g  —  G(ji),  g.  v,  WF,  and  UF.  Then  solve  for  8 ,  update,  and  repeat.  This  is  iteratively  reweighted 
least  squares  with  Fisher  scoring. 

E.3  Canonical  Link 

For  the  canonical  link,  wFi  =  g(j]{)  =  v( g{)  and  uFi  —  (yt  —  g^/wpp  Also,  since  d9Jd(3  = 
and  d2di  /d/3l3t  —  0,  it  follows  that  E  =  “K (/?).  So  Newton-Raphson  and  Fisher  scoring 
are  equivalent. 

E.4  Confidence  Intervals 

Normal- approximation  100y%  confidence  intervals  on  the  mean  response  are  given  by 

G{xp  ±  (hi-d -y)/2^xtVx),  (E-22) 

where  <t>  is  a  standard  normal  quantile,  V  is  the  estimated  parameter  variance  matrix,  and  x  is  a 
row  of  an  X  matrix  corresponding  to  the  desired  level.  For  the  basis  implementation,  this  is 

(E-23) 


E.5  Bernoulli  Response 

Suppose  the  response  Y  E  {0,1}  is  Bernoulli  with  Pr[T  =  1]  —  g  =  1  —  Pr[Y  =  0].  The 
Bernoulli  probability  is 

/(y)  =  /^y(l  -  M)1_y  =  exp  [y  log 7^  +  logCl  -  (E-24) 

so  a  =  1,  c  —  0,  and  there  is  no  nuisance  parameter.  Furthermore,  9  —  log(/r/(l  —  /r))  and 
g  =  l/(l  +  e_e),  and  so  b(9)  =  —  log(l  —  g)  =  log(l  +  e0).  Note  that  E  Y  —  b' (0) 

=  e0/ (l  +  e0)  —  g  and  Var  Y  —  v(g)  —  b"(9)  —  e~9 / (l  +  e-0)  =  g(l  —  g)  as  expected. 

With  g  —  9  and  g  —  G(9),  the  canonical  link  for  Bernoulli  response  is  seen  to  be  the  logistic  cdf 
G(j])  —  1/(1  -I-  e~v)  =  g.  Note  that  g(j])  —  g(l  —  g)  —  v(g ),  so  wFi  —  gt{  1  —  gt)  and 
1 iFi  —  —  /rj)/(/Uj(l  —  gt )).  The  resulting  model  is  logistic  regression,  or  the  logit  model. 

For  an  arbitrary  link  cdf  G.  take  g  —  Xfi.  g  —  G(X/3),  v(g)  —  g(l  —  g),  and  g(r 7)  =  g(X[f). 

Use  of  the  standard  normal  cdf  G  —  <t>  with  pdf  g  =  <p  gives  the  probit  model. 
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Because  the  likelihood  function  is  L  —  fl  hf1  •  (1  —  Mt)1  it  follows  that  Lfuli  =  1  for  the 
Bernoulli  model  and  the  deviance  is  A  =  —2  log  L. 

As  an  example,  consider  the  usual  two-parameter  model  with  predictor  (xl7 ... ,  xn)  and  response 
(y1; ... ,  yn).  The  increment  8  =  (d0,  d x)  is  the  solution  of  MS  =  A ,  where 


M  =  XtWX  = 


-  Zwj 
SwjXj 


ZwjXj- 

ZwjX?. 


and  A  =  XlWU 


ZwiUi 

ZwiUiXi 


(E-25) 


To  do  the  Fisher  update  of  section  E.2,  calculate  the  linear  response  gt  —  b0  +  bxxx,  mean 
gx  =  G(rji),  derivative  gx  —  g(j]i),  variance  vx  —  gx(l  —  gx),  transformed  response  ux  —  uFi 
—  (.Xi  ~  d-d/dii  weight  wx  —  wFi  -  gf  /vx,  and  weighted  transformed  response  WjU*  =  wFiuFi 

=  (Xi  -  v-dgd Vi- 

For  the  Newton-Raphson  update,  wDi  =  (yt  —  y()(,g'ty  —  gfv-)/vf  and  wNi  —  wFi  —  wDi  and 
u-i  =  uNi  =  wFi/wNi  ■  (y*  -  gd/gt.  Then  wx  =  wNi  and  w ^  =  wNiuNi  =  wFiuFi. 

For  the  canonical  link  wx  —  wFi  —  gL  —  vx  and  w{ax  =  wFiuFi  —  Xi  ~  di- 
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